Telehealth-Enabled In-Home Elbow Rehabilitation for Brachial Plexus Injuries Using Deep-Reinforcement-Learning-Assisted Telepresence Robots

Due to damage to the network of nerves that regulate the muscles and feeling in the shoulder, arm, and forearm, brachial plexus injuries (BPIs) are known to significantly reduce the function and quality of life of affected persons. According to the World Health Organization (WHO), a considerable share of global disability-adjusted life years (DALYs) is attributable to upper limb injuries, including BPIs. Telehealth can improve access concerns for patients with BPIs, particularly in lower-middle-income nations. This study used deep reinforcement learning (DRL)-assisted telepresence robots, specifically the deep deterministic policy gradient (DDPG) algorithm, to provide in-home elbow rehabilitation with elbow flexion exercises for BPI patients. The telepresence robots were used for a six-month deployment period, and DDPG drove the DRL architecture to maximize patient-centric exercises with its robotic arm. Compared to conventional rehabilitation techniques, patients demonstrated an average increase of 4.7% in force exertion and a 5.2% improvement in range of motion (ROM) with the assistance of the telepresence robot arm. According to the findings of this study, telepresence robots are a valuable and practical method for BPI patients’ at-home rehabilitation. This technology paves the way for further research and development in telerehabilitation and can be crucial in addressing broader physical rehabilitation challenges.


Introduction
Telehealth emerges as a promising solution to the barriers faced in accessing rehabilitation services for BPIs in lower-middle-income countries.Telehealth can bridge the gap between rural patients and quality healthcare services by leveraging technology.The need for telehealth is further compounded by current global trends, where, according to the World Health Organization, DALYs attributable to upper limb injuries, including BPIs, have seen an estimated increase of 3% over the last decade.In countries like Pakistan, where 64% of the population resides in rural areas [1], implementing telehealth services can significantly reduce the burden of disability and improve the overall quality of life for individuals with BPIs.
Technologies are used to remotely provide clinical information and health-related services in telehealth, a fast-developing area of healthcare.This field has many uses, one of which is rehabilitation, where robotics is very important.Through interactive and Sensors 2024, 24, 1273 2 of 24 customized rehabilitation experiences, robotic technologies in telehealth go beyond conventional telemedicine.Relatively speaking, these remote-controlled robotic systems allow for more accurate and consistent therapy sessions than traditional in-person therapies offer.In particular, for conditions like BPIs, where consistent and targeted exercises are essential for recovery, this robotics integration with telehealth has revolutionized patient care.
A network of nerves called the brachial plexus emerges from the spinal cord, passes through the armpit and neck, and then splits off to become the nerves responsible for upper limb sensation and muscle control.It includes the muscles and skin of the chest, shoulder, arm, and hand as well as the roots, trunks, divisions, cords, and branches that supply that innervation.Many impairments can arise from BPIs, which are typically caused by trauma, tumors, or inflammation.These can include, but are not limited to, paralysis in extreme cases, loss of feeling, and muscle weakness, as shown in Figure 1.Such injuries can have a significant negative effect on a patient's functionality and quality of life, requiring specialized care and rehabilitation techniques.BPIs can occur due to various reasons such as trauma, tumors, or inflammation.
Sensors 2024, 24, x FOR PEER REVIEW 2 of 25 tomized rehabilitation experiences, robotic technologies in telehealth go beyond conventional telemedicine.Relatively speaking, these remote-controlled robotic systems allow for more accurate and consistent therapy sessions than traditional in-person therapies offer.In particular, for conditions like BPIs, where consistent and targeted exercises are essential for recovery, this robotics integration with telehealth has revolutionized patient care.
A network of nerves called the brachial plexus emerges from the spinal cord, passes through the armpit and neck, and then splits off to become the nerves responsible for upper limb sensation and muscle control.It includes the muscles and skin of the chest, shoulder, arm, and hand as well as the roots, trunks, divisions, cords, and branches that supply that innervation.Many impairments can arise from BPIs, which are typically caused by trauma, tumors, or inflammation.These can include, but are not limited to, paralysis in extreme cases, loss of feeling, and muscle weakness, as shown in Figure 1.Such injuries can have a significant negative effect on a patient's functionality and quality of life, requiring specialized care and rehabilitation techniques.BPIs can occur due to various reasons such as trauma, tumors, or inflammation.For instance, motor vehicle accidents, especially motorcycle crashes, account for a substantial percentage of traumatic BPIs, as shown in Figure 2, which provides an essential visual summary of the varied mechanisms leading to BPIs, with each panel (A-F) depicting a distinct accident scenario that highlights the complexity and diversity of BPI causes, aiding in quick and clear comprehension.These injuries can range from minor, which might involve stretching the nerves, to severe cases, such as avulsion, where the nerve roots are torn from the spinal cord.The severity and location of the damage significantly influence the functional outcome.According to a study conducted in [2,3], over 60% of BPI patients suffer from impairments in activities of daily living, with an estimated 27% facing significant chronic pain.For instance, motor vehicle accidents, especially motorcycle crashes, account for a substantial percentage of traumatic BPIs, as shown in Figure 2, which provides an essential visual summary of the varied mechanisms leading to BPIs, with each panel (A-F) depicting a distinct accident scenario that highlights the complexity and diversity of BPI causes, aiding in quick and clear comprehension.These injuries can range from minor, which might involve stretching the nerves, to severe cases, such as avulsion, where the nerve roots are torn from the spinal cord.The severity and location of the damage significantly influence the functional outcome.According to a study conducted in [2,3], over 60% of BPI patients suffer from impairments in activities of daily living, with an estimated 27% facing significant chronic pain.
Sensors 2024, 24, x FOR PEER REVIEW 2 of 25 tomized rehabilitation experiences, robotic technologies in telehealth go beyond conventional telemedicine.Relatively speaking, these remote-controlled robotic systems allow for more accurate and consistent therapy sessions than traditional in-person therapies offer.In particular, for conditions like BPIs, where consistent and targeted exercises are essential for recovery, this robotics integration with telehealth has revolutionized patient care.
A network of nerves called the brachial plexus emerges from the spinal cord, passes through the armpit and neck, and then splits off to become the nerves responsible for upper limb sensation and muscle control.It includes the muscles and skin of the chest, shoulder, arm, and hand as well as the roots, trunks, divisions, cords, and branches that supply that innervation.Many impairments can arise from BPIs, which are typically caused by trauma, tumors, or inflammation.These can include, but are not limited to, paralysis in extreme cases, loss of feeling, and muscle weakness, as shown in Figure 1.Such injuries can have a significant negative effect on a patient's functionality and quality of life, requiring specialized care and rehabilitation techniques.BPIs can occur due to various reasons such as trauma, tumors, or inflammation.For instance, motor vehicle accidents, especially motorcycle crashes, account for a substantial percentage of traumatic BPIs, as shown in Figure 2, which provides an essential visual summary of the varied mechanisms leading to BPIs, with each panel (A-F) depicting a distinct accident scenario that highlights the complexity and diversity of BPI causes, aiding in quick and clear comprehension.These injuries can range from minor, which might involve stretching the nerves, to severe cases, such as avulsion, where the nerve roots are torn from the spinal cord.The severity and location of the damage significantly influence the functional outcome.According to a study conducted in [2,3], over 60% of BPI patients suffer from impairments in activities of daily living, with an estimated 27% facing significant chronic pain.Rehabilitation plays a critical role in the management of BPIs.Early intervention and a well-structured rehabilitation program can significantly improve affected individuals' functional outcomes and quality of life [4].The rehabilitation process usually involves physical therapy, occupational therapy, and sometimes surgical interventions.Physical therapy focuses on maintaining the range of motion, reducing pain, and strengthening the muscles around the shoulder and arm.Occupational therapy is vital for enabling the patient to regain the ability to perform daily activities [5].
While rehabilitation is crucial, access to quality healthcare and rehabilitation services remains a significant challenge, especially in lower-middle-income countries.Rural areas in these countries often lack the necessary infrastructure and qualified healthcare professionals to provide specialized care for patients with BPIs.Additionally, travel to urban areas with better healthcare facilities is often not feasible due to financial constraints and the debilitating nature of the injury.Consequently, many patients with BPIs in these areas do not receive the much-needed rehabilitation services, leading to poor functional outcomes and a reduced quality of life.
Our study distinguishes itself by integrating deep reinforcement learning (DRL), specifically the deep deterministic policy gradient (DDPG) algorithm, with telepresence robots for the in-home elbow rehabilitation of patients with brachial plexus injuries (BPIs).This integration not only marks a significant advancement over conventional rehabilitation methods but also over existing automated or semiautomated systems.Unlike prior studies, our approach leverages the robustness and adaptability of DRL to tailor rehabilitation exercises to individual patient needs, thereby enhancing both the force exertion and range of motion outcomes.Furthermore, the unique application of low-cost, off-the-shelf components in our telepresence robots positions this study at the forefront of accessible and efficient in-home rehabilitation solutions.By demonstrating the practicality and effectiveness of our DRL-assisted system, this research paves the way for future innovations in telerehabilitation, particularly in addressing the challenges of physical rehabilitation with advanced, yet cost-effective, technology solutions.
Rehabilitation techniques that involve elbow flexion exercises are essential for treating BPIs.By boosting neuronal connections and strengthening the biceps and brachialis muscles, these workouts can help people regain movement in their upper limbs [6].Regular practice can result in noticeable improvements using the robotic arm of telepresence robots and DDPG, speeding up healing and elevating the patient's quality of life.
This research paper is structured into five main sections after the introduction: Literature Review (contextualizing existing studies), Methodology (detailing the development and deployment of DRL-supported telepresence robots), Results (presenting data and findings), Discussion (interpreting results and exploring implications), and Conclusion (summarizing findings and suggesting future research directions).

Literature Review
Telehealth rehabilitation had its roots in the 1960s, when some hospitals and university medical centers started experimenting with telemedicine to reach patients in remote areas.However, it was not until the advent of the internet and advancements in telecommunications technology in the 1990s that telehealth began to take form.Telehealth has been gaining momentum over the past two decades, and its application in rehabilitation is diverse.One notable example is the development of the InMotion ARM, a telehealth robotic system that allows for remote physical therapy for stroke patients [7].
Furthermore, robotic assistance in telerehabilitation has proven to be a groundbreaking advancement; [8] presented an extensive study on how robots have been instrumental in rehabilitating patients with neuromuscular disorders.This work especially delves into upper and lower limb exercises, stressing the role of robots in aiding patients in performing high-intensity repetitive tasks, which is crucial for neuroplasticity.The authors showed how robots can objectively measure patients' movements and improvements.
Jin et al. [9] have demonstrated the efficacy of DRL in improving post-stroke limb function, whereas Majhi and Kashyap [10] have explored adaptive algorithms for patientspecific therapy adjustments.Furthermore, the work by Wang et al. [11] has been instrumental in showcasing how DRL can optimize engagement levels during robotic-assisted therapy.These studies underscore the potential of DRL to enhance the adaptability and personalization of rehabilitation protocols according to motor rehabilitation.
Another instance is Teleswallowing Rehabilitation, which assists in the remote assessment and management of dysphagia in elderly patients [12,13].These examples scratch the surface of what has been accomplished in telehealth rehabilitation, where systems have been developed for various types of physical impairments, speech therapy, and more.Robotics has been a key component in advancing rehabilitation methods.For example, the Lokomat, developed by Hocoma, is a robotic gait therapy device widely used in rehabilitating individuals with spinal cord injuries and stroke [14].Another example is the ArmeoPower, a robotic exoskeleton for arm and arm rehabilitation, also for patients with stroke or spinal cord injuries [15][16][17].
Telehealth rehabilitation has also been influential in managing chronic pain, as discussed by [18][19][20].Their work focuses on internet-delivered treatment for chronic pain management.It also delves into how integrating psychological approaches into telehealth platforms has helped in better pain management.Moreover, telerehabilitation is increasingly seen as a viable method for managing cardiovascular diseases.In a study by [21][22][23][24][25], a home-based telerehabilitation program was studied for patients with heart failure.The study elucidated how telerehabilitation could effectively enhance the exercise capacity and quality of life of patients with chronic heart diseases.An interesting approach is also observed in a research work by [26,27], where the authors developed a tele-treatment program for patients with chronic obstructive pulmonary disease (COPD).They designed a service platform that includes exercise, education, and counselling, supported by a triage function.Godine et al. in [28] addressed the novel approaches in telehealth for behavioral management in individuals with neurological conditions.They discussed various telerehabilitation interventions, such as cognitive behavioral therapy, motivational interviewing, and mindfulness-based stress reduction, that can be leveraged to manage symptoms and enhance the quality of life.Moreover, integration with electronic health records (EHRs) has emerged as an essential feature in telehealth rehabilitation.According to [29], EHR integration enables efficient information sharing, leading to better coordination in care processes.
Interactive online platforms have enabled the remote delivery of physical therapy.Johnson et al. in [30] explored an online platform, PhysiTrack, which allowed physical therapists to design personalized exercise programs.Patients from their homes could access these.The study demonstrated that patients using PhysiTrack showed better exercise adherence and reported higher satisfaction levels than in traditional physical therapy.Robotics is another domain that has greatly advanced telehealth rehabilitation.In a study [31] by Radder et al., telerehabilitation robotics were shown to be effective in providing intensive task-specific training, especially for stroke patients.The study highlighted that robotic devices could deliver repetitive training tasks, which are often required for neuromuscular rehabilitation.A study [32] by Patel et al. showcased the use of kinematic sensors and smart textiles to remotely monitor patients' movements during physical therapy.These real-time data were critical for providing feedback to both the patient and the therapist, allowing for more targeted and effective therapy.Virtual reality (VR) has been another significant advancement.Laver et al.'s study [33] showed that VR could be effectively employed in telerehabilitation settings, particularly for stroke rehabilitation.Patients using VR systems showed improved physical activity compared to those who underwent conventional therapy.
DRL has been employed in various healthcare applications.For example, Peng et al. [34], utilized DRL for dose optimization in radiation therapy.Another application of DRL is in optimizing treatment plans for patients with chronic conditions such as Sensors 2024, 24, 1273 5 of 24 diabetes, where Prasad et al. [35] used DRL to create personalized insulin plans.One recent development is using artificial intelligence (AI) in telerehabilitation.Wade et al. [36] highlighted AI's role in enhancing telerehabilitation outcomes.By incorporating AI algorithms, it is possible to analyze patients' data to create personalized rehabilitation plans that can dynamically change as per their progress.Table 1 describes different published research work with respective technologies and advantages and disadvantages.

Research Paper
Technology Used Advantages Disadvantages [1] Telehealth robotic system • Allows for remote physical therapy.

•
Enhances motor recovery in stroke patients.
• Limited to upper extremities.

•
Reduces the need for patient travel.
• Requires a reliable internet connection.

•
May not be suitable for severe cases.

•
Adjustable to individual needs.

•
Requires trained personnel for operation.
[7] Robotic exoskeleton The primary objective of this research is to develop a telepresence robot integrated with a robotic arm that is enabled with DRL for the in-home elbow rehabilitation of patients with BPIs, along with evaluating the effectiveness of DRL-supported telepresence robots in improving the range of motion and strength of the affected elbow in a rural setting and then assessing the cost-effectiveness of this approach compared to conventional rehabilitation techniques.
This research contributes to the growing knowledge in telehealth and robotics for re-habilitation in several ways.Firstly, it employs DRL in telepresence robots, a novel application in physical rehabilitation.Specifically, using the DDPG algorithm enables the robots to learn and adapt to patient-specific needs, thus providing a more personalized rehabilitation experience by using the robotic arm.

Overview of Deep Reinforcement Learning (DRL)
DRL is a subfield of machine learning that combines reinforcement learning (RL) and deep learning.It essentially trains an agent to make a series of decisions to maximize a cumulative reward through interactions with an environment.
Let us begin by understanding reinforcement learning.In RL, an agent takes action in an environment to achieve a certain goal.Formally, this is modelled as a Markov decision process (MDP) [37].An MDP is defined by a tuple (S, A, P, R), where • S is the state space; • A is the action space; • P is the state transition probability; • R is the reward function, R: S × A → R.
At each time step t, the agent observes a state, s t , takes an action, a t , receives a reward, r t , and transitions to a new state, s t+1 .The agent's goal is to learn a policy, π, that maximizes the expected cumulative reward.
The expected cumulative reward, also known as the return G, is the sum of the rewards obtained after taking action a t in state s t , and it can be formally defined in Equation ( 1), as follows: where γ is the discount factor between 0 and 1.
In DRL, deep learning techniques are used to approximate the functions in reinforcement learning.Specifically, deep neural networks are used to approximate either the policy π (called policy networks) or the value functions V or Q (called value networks).
One of the most common algorithms in DRL is Deep Q-Networks (DQN) [38], which are based on Q-learning.Q-learning learns the Q-function, which is the expected return when taking an action in state s and following policy π.The Q-function is defined in Equation ( 2), as follows: In DQN, a deep neural network is used to approximate the Q-function.The network is trained by minimizing the difference between the predicted Q-value and the target Q-value, which is calculated using the Bellman equation as defined here in Equation (3): Another important class of algorithms in DRL is policy gradients.Instead of learning a value function, policy gradient methods directly learn a policy.The objective is to find the policy that maximizes the expected return, as is stated here in Equation (4): where θ are the parameters of the policy, and π θ(a|s) is the probability of taking action a in state s under policy π.
Actor-critic methods combine value-based and policy-based methods.The actor is the policy model, and the critic evaluates the action taken by the actor.The actor uses policy gradients, and the critic is updated using methods like Q-learning.TRPO and PPO are advanced policy gradient methods.TRPO ensures that each update does not change the policy too much to ensure stable learning, and PPO is a simplified version of TRPO, which is more efficient.

Deep Deterministic Policy Gradient (DDPG) Algorithm
DDPG is an algorithm that falls under the category of actor-critic methods in DRL [39].DDPG is designed to handle environments with continuous action spaces, making it suitable for various real-world applications such as robotics and autonomous systems.
Traditional policy gradient methods work well with discrete action spaces but struggle with continuous action spaces due to the need to compute probabilities for an infinite number of actions.DDPG overcomes this by adapting the DQN algorithm for continuous action spaces.Instead of outputting Q-values for each possible action, the network in DDPG Sensors 2024, 24, 1273 7 of 24 outputs the most optimal action directly.DDPG is essentially an off-policy algorithm and an approximate DPG-hence the name deep deterministic policy gradient.
DDPG has two primary components, an actor and a critic.
• Actor: The actor is a neural network that takes the current state as input and outputs a continuous action or set of actions.The actor's role is to learn the optimal policy function.

•
Critic: The critic evaluates the action output by the actor by computing the Q-value.The critic's role is to learn the optimal value function.
Both the actor and critic have their neural networks.Moreover, DDPG employs target networks for both the actor and critic, which are copies of their respective networks.The target networks are used to calculate target values during learning and are updated slowly to maintain stability.
The critic network is updated using the Bellman equation as in Q-learning.Given a tuple (s, a, r, s ′ ) where s is the current state, a is the action taken, r is the reward, and s ′ is the next state, the target Q-value is computed, as demonstrated here in Equation (5): where γ is the discount factor, Q ′ is the critic's target network, and µ ′ is the actor's target network.The TD error is the difference between the target Q-value and the estimated Q-value, as stated here in Equation ( 6): The actor's objective is to maximize the expected Q-values.The policy gradient ascent is performed using the deterministic policy gradient theorem.The gradient of the objective function, J, with respect to the actor parameters, θ, is described in Equation ( 7), as follows: This essentially means that the gradient of the Q-value updates the actor with respect to the action times the gradient of the action, with respect to the actor's parameters.

Telepresence Robots
Telepresence robots represent a remarkable intersection of robotics, communication technology, and human-computer interaction, as shown in Figure 3.They facilitate a sense of presence or being there for people geographically distant from each other.Telepresence robots are often equipped with a display, camera, speakers, and microphones, which enable video conferencing and motors for mobility.
The design parameters of our telepresence robot are triggered by a wheeled motion of a vehicle, such as turning angle, velocity, and angular momentum.The telepresence robot's parameters were chosen after an analysis of the kinematic motion of the robot, as shown in Table 2.The design parameters of our telepresence robot are triggered by a wheeled motion of a vehicle, such as turning angle, velocity, and angular momentum.The telepresence robot's parameters were chosen after an analysis of the kinematic motion of the robot, as shown in Table 2.The concept of telepresence robots revolves around extending a person's ability to participate in distant environments virtually [40].These robots are employed in various sectors, including healthcare [41], education [42], business [43], and social interactions [44].In healthcare, for instance, they enable doctors to interact with patients in remote locations.The major components of a telepresence robot are the following: 1. Mobility and Navigation: Most telepresence robots have wheels and can move around.They use various sensors, such as LIDAR or ultrasonic sensors, for navigation.The control of robot mobility can be through a remote user or automated using algorithms.2. Communication: This is central to the concept of telepresence.Robots usually have a camera, microphone, and speakers that facilitate video conferencing.The transmission of audio-visual data should be in real time or with minimal latency.3. Robotic Arm: The telepresence robot is equipped with a robotic arm that assists the BPI patient in elbow flexion.4. User Interface: Telepresence robots usually have an interface allowing remote users to control them.This could be through a web application, desktop software, or even a mobile app.The concept of telepresence robots revolves around extending a person's ability to participate in distant environments virtually [40].These robots are employed in various sectors, including healthcare [41], education [42], business [43], and social interactions [44].In healthcare, for instance, they enable doctors to interact with patients in remote locations.The major components of a telepresence robot are the following: 1.
Mobility and Navigation: Most telepresence robots have wheels and can move around.They use various sensors, such as LIDAR or ultrasonic sensors, for navigation.The control of robot mobility can be through a remote user or automated using algorithms.

2.
Communication: This is central to the concept of telepresence.Robots usually have a camera, microphone, and speakers that facilitate video conferencing.The transmission of audio-visual data should be in real time or with minimal latency.

3.
Robotic Arm: The telepresence robot is equipped with a robotic arm that assists the BPI patient in elbow flexion.4.
User Interface: Telepresence robots usually have an interface allowing remote users to control them.This could be through a web application, desktop software, or even a mobile app. 5.
Autonomy and Battery Life: Since these robots are mobile, they need to be batterypowered.Battery life and the ability to autonomously return to a charging station when the battery is low are important considerations.
Continuous monitoring is an essential component of rehabilitation.Telepresence robots equipped with robotic arms can remotely assist the patients' physical activities as concerns the affected arm and vital signs.These robots can be programmed to conduct regular check-ins with patients, ensure that they adhere to their rehabilitation program, and relay this information to healthcare professionals.
Figure 4 illustrates the core general components of our telepresence robot's hardware architecture.Central to the system is the computer system, encompassing a robust microprocessor and microcontroller, which orchestrates the device's operations.Peripheral modules include input/output devices, such as cameras and microphones for sensory data acquisition, and an LCD for display.Actuation mechanisms are driven by motor drivers and motors, powered by an integrated battery system.The charging dock ensures continuous operation, whereas the indicator light provides real-time status feedback.This schematic is pivotal for replicating the robot's hardware setup in further studies.
architecture.Central to the system is the computer system, encompassing a robust microprocessor and microcontroller, which orchestrates the device's operations.Peripheral modules include input/output devices, such as cameras and microphones for sensory data acquisition, and an LCD for display.Actuation mechanisms are driven by motor drivers and motors, powered by an integrated battery system.The charging dock ensures continuous operation, whereas the indicator light provides real-time status feedback.This schematic is pivotal for replicating the robot's hardware setup in further studies.Our telepresence robot was constructed using lightweight fiber materials, chosen for their balance of cost-effectiveness and durability.These materials are capable of supporting human hand weight of up to 30 kg, making them ideal for our rehabilitation application.The robot's sensing capabilities are a cornerstone of its functionality.We employed forcesensing resistors (FSRs) interfaced with an Arduino to accurately compute the force exerted by a patient's hand.This setup is crucial for monitoring the rehabilitation progress and adjusting the exercises accordingly.Additionally, for joint movement detection and angle calculation, our system utilized Adafruit flex sensors.These sensors provide precise feedback on the angles and movements of the robotic arm's joints, enabling detailed tracking and adaptation of the rehabilitation process.
To maintain the focus on accessibility and affordability, especially in lower-middleincome countries, we opted for off-the-shelf electronic components.This decision not only demonstrates the feasibility of our approach but also ensures that our system can be replicated and utilized in various settings with minimal cost barriers.
Regarding the conventional rehabilitation methods used for comparison in our study, these sessions employed a standardized approach to measure force exertion, using comparable equipment and methodologies to those of the robotic system.This comparative analysis is vital to demonstrate the effectiveness of our telepresence robot system against traditional methods and thereby highlights the potential impact of our research in the field of rehabilitation technology.

Operation of the Telepresence Robot for Elbow Flexion Exercises
Telepresence robots equipped with robotic arms integrated with sensors are a cuttingedge technology for rehabilitation exercises, especially for elbow flexion in patients with upper limb impairments such as BPIs.A flowchart of the principles governing the operation of a telepresence robot arm for assisting in elbow flexion exercises is shown in Figure 5.

Sensing Phase
Firstly, let us focus on the sensing aspect of the robotic arm.Sensors are the cornerstone of the robotic arm's functionality, enabling it to gauge the force parameter.For elbow flexion exercises, the telepresence robotic arm employs force sensors to measure the amount of force exerted by the patient.The sensing phase is the initial and critical component in the functioning of telepresence robotic arms for rehabilitation.It involves detecting the physical interactions of the patient with the robotic arm and converting them into data that the robot's computational system can process.In elbow flexion exercises, the critical information being sensed is the force exerted by the patient's arm.
comparable equipment and methodologies to those of the robotic system.This comparative analysis is vital to demonstrate the effectiveness of our telepresence robot system against traditional methods and thereby highlights the potential impact of our research in the field of rehabilitation technology.

Operation of the Telepresence Robot for Elbow Flexion Exercises
Telepresence robots equipped with robotic arms integrated with sensors are a cutting-edge technology for rehabilitation exercises, especially for elbow flexion in patients with upper limb impairments such as BPIs.A flowchart of the principles governing the operation of a telepresence robot arm for assisting in elbow flexion exercises is shown in Figure 5.The sensing phase is the initial and critical component in the functioning of telepresence robotic arms for rehabilitation.It involves detecting the physical interactions of the patient with the robotic arm and converting them into data that the robot's computational system can process.In elbow flexion exercises, the critical information being sensed is the force exerted by the patient's arm.
Types of Sensors 1. Force Sensors: Force-sensing resistors (FSRs), like the load cell or piezoelectric force sensor, measure the amount of force exerted on the robotic arm.A load cell typically uses a strain gauge that changes its electrical resistance when deformed by force.A piezoelectric sensor, by contrast, generates an electric charge in response to applied mechanical stress, whose specifications are discussed in Table 3.

Technical Specifications Details Sensor Type
Force-Sensing Resistor (FSR) (piezoelectric sensor) Force Sensors: Force-sensing resistors (FSRs), like the load cell or piezoelectric force sensor, measure the amount of force exerted on the robotic arm.A load cell typically uses a strain gauge that changes its electrical resistance when deformed by force.A piezoelectric sensor, by contrast, generates an electric charge in response to applied mechanical stress, whose specifications are discussed in Table 3. 2.
Position and Angle Sensors: Since the robot needs to know the arm's position and the elbow joint's angle, position-and potentiometer-based angle sensors are used.These sensors give information about the spatial configuration of the patient's arm, which is vital for adjusting the assistance provided, whose specifications are discussed in Table 4. Mathematical Equations and Relations 1.
Force Sensors: For strain gauge-based force sensors, the change in resistance, ∆R, is proportional to the strain, ε, which is proportional to the force, F, applied.This relationship can be expressed in Equation ( 8), as follows: where k = gauge factor (dimensionless constant), ε = strain, F = force applied, A = crosssectional area through which force is applied, and E = Young's modulus of the material.

2.
Position and Angle Sensors: Resistance varies linearly with the rotation angle for potentiometer-based angle sensors.If R0 is the resistance at 0 degrees and Rmax is the maximum resistance at the maximum rotation angle, the relationship can be expressed in Equation ( 9), as follows: where R(θ) = Resistance at angle θ, θ = Current angle of rotation, and θ max = Maximum angle of rotation.

Deep Deterministic Policy Gradient (DDPG) Phase
As previously discussed, DDPG is an algorithm that can handle continuous action spaces and is, hence, suitable for the complex movements involved in physical rehabilitation.The DDPG algorithm in the telepresence robot consists of two main neural network components: the actor and the critic.
The state formed in the previous phase is fed into the actor network, which then suggests an action-in this case, the appropriate amount of assistive force to apply.On the other hand, the critic network evaluates the predicted Q-value of taking that action in the given state.
These networks are trained to maximize the expected cumulative reward, where the reward could be based on how effectively the robotic arm assisted the patient in achieving elbow flexion without straining the muscles.
To calculate the force that the telepresence robot needs to apply to the patient's arm to assist with the elbow flexion exercise, we need to consider factors like the force exerted by the patient, the desired trajectory for the movement, and the dynamics of the patient's arm.The telepresence robot can use the DDPG algorithm to determine the optimal force to apply.
Let us denote the following: • Fp: Force exerted by the patient (measured using sensors, as described previously); • m: Mass of the patient's forearm and arm; • a: Desired acceleration of the patient's arm during the exercise; • g: Gravitational acceleration (9.81 m/s 2 ); • θ: Angle between the forearm and the vertical movement; • µ: Coefficient of friction between the patient's arm and the robot's arm; • Ft: Force exerted by the telepresence robot on the patient's arm.

Desired Acceleration (a):
The desired acceleration can be determined based on the trajectory planned for the elbow flexion movement.The DDPG algorithm considers various factors, including the current state of the patient's arm, the desired state, and other constraints to compute the desired acceleration.

2.
Frictional Force (Ff): The friction between the robot's arm and the patient's arm needs to be considered as mentioned here in Equation (10):

3.
Force Required for Desired Acceleration (Fa): From Newton's second law, the force required to achieve the desired acceleration is given by Equation ( 11), as follows: 4.

Force to Counteract Gravity (Fg):
The component of the gravitational force in the direction of the movement is described in Equation ( 12), as follows: 5.

Total Force by Telepresence Robot (Ft):
The total force that the robot needs to apply is the sum of the force required for the desired acceleration, the force to counteract gravity, and the frictional force.Additionally, the force exerted by the patient (Fp) needs to be considered, as mentioned here in Equation ( 13): This force Ft calculated is what the telepresence robot needs to exert on the patient's arm to assist in the elbow flexion exercise.The DDPG algorithm can be used to compute and adjust the desired acceleration in real time based on sensory feedback and ensure smooth and effective movement.

Action Execution Phase
Once the DDPG algorithm decides on the assistive force, the robotic arm applies it to facilitate the patient's upward movement.This is done through actuators in the robotic arm, which can exert force.The actuators could be based on electrical motors or hydraulic systems.

Feedback and Learning Phase
After the action is executed, the new state is observed along with the reward.This information is fed back into the DDPG algorithm to update the actor and critic networks.This learning phase is vital for adapting the robotic arm's assistance over time to match the patient's progress.
We have a telepresence robot equipped with a robotic arm in the scenario given.This arm is to assist a patient's arm in performing an elbow flexion exercise.The robotic arm has sensors to measure the force exerted by the patient, and it uses a DDPG algorithm to estimate the required force to be applied by the robot to assist the movement.The process of the algorithm of DDPG for telepresence robots is discussed in Algorithm 1. Store (s, a, r, s ′ ) in replay buffer R 17 Sample a random minibatch of (s i , a i , r i , Calculate target Q-value using target networks: 19 Update the Critic network by minimizing the loss: Update the Actor policy using the sampled policy gradient: Soft update target networks: end for 29 end for The pseudocode explained above starts by initializing the actor and critic networks, their target networks, and the replay buffer.Then, a loop is run through episodes (iterations of learning).Each episode represents an elbow flexion exercise session.Then, for each timestep within an episode, the robot chooses an action based on the current policy of the actor network and some noise for exploration.The action is the force applied by the robotic arm.Then, the robot executes the action and observes the reward and new state.The reward can be designed based on the successfulness of the movement, smoothness, and patient feedback.Then, the experience is stored in the replay buffer.After the previous step, a minibatch of experiences is sampled from the buffer, and the target Q-value is computed using the target networks.Then, the critic network is updated by minimizing the difference between the target and estimated Q values.The actor network is updated by performing policy gradient ascent to maximize the expected reward.Then, the target networks are softly updated toward the actual networks.The process is repeated for many episodes until the learning converges and the robot can efficiently assist in the elbow flexion exercises.

Experimental Setup
The experimental setup for this research is designed to evaluate the efficacy of using a telepresence robot to assist patients with BPIs in performing elbow flexion exercises.This setup is split into two environments: the patient's home, where the telepresence robot is physically present, and the remote location of the healthcare provider, which may be a clinic or hospital.
Table 5 presents a demographic overview of 406 patients with traumatic brachial plexus lesions.It shows a significant male predominance, with men constituting 94.6% of cases.The average age of the patients was approximately 28 years.The majority of injuries were caused by motorcycle accidents, accounting for 79% of cases.Lesions were more frequent in the left plexus.The data also indicate a variety of lesion types and associated injuries, including head trauma and bone fractures, highlighting the complexity and severity of these injuries.• Stage 1: Preparation and Positioning Before the interaction, the patient needs to be appropriately positioned.The robot should be adjustable so that its arm is at the same height as the patient's arm while the patient lies in bed.
• Stage 2: Calibration of Robotic Arm Before the exercise, the robotic arm needs to be calibrated to ensure the sensors accurately capture the force the patient applied.This might include adjusting the sensitivity of the sensors and making sure the robot's arm mimics a human arm's natural range of motion.
• Stage 3: Initial Grip and Force Measurement The patient grips the robotic arm, and an initial force measurement is taken to establish the baseline strength of the patient's grip and upward force.This baseline is essential for the DDPG algorithm to understand how much assistance is needed.
• Stage 4: Elbow Flexion Exercise As the patient attempts to move their arm upward for the elbow flexion exercise, the force sensors on the robotic arm continuously measure the amount of force being exerted by the patient.

•
Stage 5: Assistance from Robotic Arm Simultaneously, the DDPG algorithm processes the sensor data and calculates the appropriate amount of assistance.The robotic arm will then exert a controlled force that aids the patient in moving their arm upward.
This assistance is dynamically adjusted in real time based on the force the patient is applying.
• Stage 6: Verbal Interaction and Encouragement The telepresence robot may also have a speaker and microphone, allowing for verbal communication between the patient and the healthcare provider.The healthcare provider can offer live feedback, instructions, and encouragement to the patient through the robot.
• Stage 7: Completion and Data Logging Once the exercise is complete, the robotic arm will gently lower the patient's arm back to the initial position.The data regarding the forces exerted by the patient and the assistance provided by the robotic arm are logged for further analysis.
• Stage 8: Post-Exercise Feedback After the exercise, the patient might be asked to provide feedback on the difficulty of the exercise and the effectiveness of the assistance provided by the robotic arm.This feedback can be useful for calibrating the robot for future sessions.
All the stages above are shown in Figure 6, with the respective angle movement of the arm according to the base of the bed upon which the patient is lying for exercise.
Sensors 2024, 24, x FOR PEER REVIEW 16 of 25 After the exercise, the patient might be asked to provide feedback on the difficulty of the exercise and the effectiveness of the assistance provided by the robotic arm.This feedback can be useful for calibrating the robot for future sessions.
All the stages above are shown in Figure 6, with the respective angle movement of the arm according to the base of the bed upon which the patient is lying for exercise.

•
Computer Setup: The healthcare provider uses a computer with internet connectivity.

•
Software Interface: A specialized software interface is installed on the computer, which allows the healthcare provider to connect to the telepresence robot remotely.

•
Display: A dual-monitor setup allows for the simultaneous viewing of the patient through the robot's camera and real-time statistics.

Doctor Interaction
In the telepresence robot system context, the doctor interaction component is crucial for enabling healthcare professionals to remotely monitor and guide the rehabilitation of patients with BPIs, as shown in Figure 7.The dashboard depicted therein is an authentic

•
Computer Setup: The healthcare provider uses a computer with internet connectivity.

•
Software Interface: A specialized software interface is installed on the computer, which allows the healthcare provider to connect to the telepresence robot remotely.

•
Display: A dual-monitor setup allows for the simultaneous viewing of the patient through the robot's camera and real-time statistics.

Doctor Interaction
In the telepresence robot system context, the doctor interaction component is crucial for enabling healthcare professionals to remotely monitor and guide the rehabilitation of patients with BPIs, as shown in Figure 7.The dashboard depicted therein is an authentic representation, demonstrating the interface through which real-time data and patient engagement are managed, providing a tangible example of the system's application in clinical settings.This component integrates seamlessly into the system, allowing for realtime communication and data analysis.Both data collection and the analysis process for a telepresence robot used in patient rehabilitation involve several stages and types of data.
Data collection includes using sensors, video and audio technology, interaction records, and feedback data.Sensors attached to the robot capture force, range of motion, and kinematic data during a patient's elbow flexion exercises.Video and audio data are collected through the robot's camera and microphone, allowing the doctor to observe and communicate with the patient remotely.Interaction data pertain to the patient's interaction with the robotic arm, including the levels of assistance the robot provides and any manual adjustments the doctor makes.Patient feedback data are obtained through standardized questionnaires, assessing parameters such as pain, comfort, and the perceived efficacy of the exercises.
The analysis of the data collected occurs in real time and longitudinally, informing algorithm optimization, statistical studies, and comprehensive reporting.Real-time analysis involves adjusting the assistance level based on force and motion data and enabling immediate medical intervention through remote monitoring.Longitudinal analysis monitors trends in the patient's progress over time and helps identify the evolution of pain and comfort perception.The data are also utilized to optimize the DDPG algorithm controlling the robotic arm, adjusting it to better suit patient needs over time.Statistical analysis is used to uncover patterns in the data, such as a significant improvement in the range of motion over a certain period.Finally, the analysis results are compiled into reports with graphical representations, summary statistics, and insights on the patient's progress, to be shared with patients and other healthcare providers.

Results and Discussion
The results section evaluates the effectiveness of the DRL-assisted telepresence robot in improving patients' BPI condition through elbow flexion exercises.The primary parameters used to assess improvement are the patient's force, the robotic arm's assistance force, and the range of motion (ROM).
Comparing our telepresence robot's effectiveness with a control group undergoing conventional rehabilitation revealed noteworthy outcomes.Our experimental group, mirroring the control group's demographics and injury types, exhibited a 4.7% increase in force exertion and a 5.2% improvement in range of motion (ROM).These statistics validate the robot's efficacy in enhancing rehabilitation outcomes compared to traditional meth- Both data collection and the analysis process for a telepresence robot used in patient rehabilitation involve several stages and of data.
Data collection includes using sensors, video and audio technology, interaction records, and feedback data.Sensors attached to the robot capture force, range of motion, and kinematic data during a patient's elbow flexion exercises.Video and audio data are collected through the robot's camera and microphone, allowing the doctor to observe and communicate with the patient remotely.Interaction data pertain to the patient's interaction with the robotic arm, including the levels of assistance the robot provides and any manual adjustments the doctor makes.Patient feedback data are obtained through standardized questionnaires, assessing parameters such as pain, comfort, and the perceived efficacy of the exercises.
The analysis of the data collected occurs in real time and longitudinally, informing algorithm optimization, statistical studies, and comprehensive reporting.Real-time analysis involves adjusting the assistance level based on force and motion data and enabling immediate medical intervention through remote monitoring.Longitudinal analysis monitors trends in the patient's progress over time and helps identify the evolution of pain and comfort perception.The data are also utilized to optimize the DDPG algorithm controlling the robotic arm, adjusting it to better suit patient needs over time.Statistical analysis is used to uncover patterns in the data, such as a significant improvement in the range of motion over a certain period.Finally, the analysis results are compiled into reports with graphical representations, summary statistics, and insights on the patient's progress, to be shared with patients and other healthcare providers.

Results and Discussion
The results section evaluates the effectiveness of the DRL-assisted telepresence robot in improving patients' BPI condition through elbow flexion exercises.The primary parameters used to assess improvement are the patient's force, the robotic arm's assistance force, and the range of motion (ROM).
Comparing our robot's effectiveness with a control group undergoing conventional rehabilitation revealed noteworthy outcomes.Our experimental group, mirroring the control group's demographics and injury types, exhibited a 4.7% increase in force exertion and a 5.2% improvement in range of motion (ROM).These statistics validate the robot's efficacy in enhancing rehabilitation outcomes compared to traditional methods.Detailed analysis is presented in [subsection], underlining the scientific merit of these findings.

DDPG Algorithm Analysis
The DDPG agent demonstrated remarkable accuracy in following various input references, with a minimal error margin of only 0.1%.This was exhibited without chattering or instability in the controller input, indicating a stable control process, as illustrated in Figure 8.This stability and accuracy verify that the DDPG network is effectively a practical-oriented algorithm.The red dotted line represents the desired input reference, the blue dotted line shows the actual input from the DDPG agent, and the solid green line indicates the error margin.The close alignment of the actual input to the reference with minimal errors demonstrates the precision and stability of the DDPG algorithm in controlling the system.
Sensors 24, x FOR PEER REVIEW 18 of 25

DDPG Algorithm Analysis
The DDPG agent demonstrated remarkable accuracy in following various input references, with a minimal error margin of only 0.1%.This was exhibited without chattering or instability in the controller input, indicating a stable control process, as illustrated in Figure 8.This stability and accuracy verify that the DDPG network is effectively a practical-oriented algorithm.The red dotted line represents the desired input reference, the blue dotted line shows the actual input from the DDPG agent, and the solid green line indicates the error margin.The close alignment of the actual input to the reference with minimal errors demonstrates the precision and stability of the DDPG algorithm in controlling the system.While the reward history exhibited high variance, primarily due to the stochastic policy used for exploration in each episode, the average reward trendline showed an increase within just 50 episodes, as seen in Figure 9.After this rise, the reward levels fluctuated between −4 and −2, indicating oscillation in the reward process.

Improvement in Force Exerted by Patient
One of the primary objectives of the elbow flexion exercises assisted by the telepresence robot was to observe the progression of the patient's force.This force signifies the ability of the patients to engage their muscles during the exercise.
Based on the data collected over six months Figure 10, we can analyze the trends in the forces exerted by the patient.Let us denote the force exerted by the patient at time '' While the reward history exhibited high variance, primarily due to the stochastic policy used for exploration in each episode, the average reward trendline showed an increase within just 50 episodes, as seen in Figure 9.After this rise, the reward levels fluctuated between −4 and −2, indicating oscillation in the reward process.

DDPG Algorithm Analysis
The DDPG agent demonstrated remarkable accuracy in following various input references, with a minimal error margin of only 0.1%.This was exhibited without chattering or instability in the controller input, indicating a stable control process, as illustrated in Figure 8.This stability and accuracy verify that the DDPG network is effectively a practical-oriented algorithm.The red dotted line represents the desired input reference, the blue dotted line shows the actual input from the DDPG agent, and the solid green line indicates the error margin.The close alignment of the actual input to the reference with minimal errors demonstrates the precision and stability of the DDPG algorithm in controlling the system.While the reward history exhibited high variance, primarily due to the stochastic policy used for exploration in each episode, the average reward trendline showed an increase within just 50 episodes, as seen in Figure 9.After this rise, the reward levels fluctuated between −4 and −2, indicating oscillation in the reward process.

Improvement in Force Exerted by Patient
One of the primary objectives of the elbow flexion exercises assisted by the telepresence robot was to observe the progression of the patient's force.This force signifies the ability of the patients to engage their muscles during the exercise.
Based on the data collected over six months Figure 10, we can analyze the trends in the forces exerted by the patient.Let us denote the force exerted by the patient at time '' as force (), where  is measured in weeks.The linear progression in patient strength

Improvement in Exerted by Patient
One of the primary objectives of the elbow flexion exercises assisted by the telepresence robot was to observe the progression of the patient's force.This force signifies the ability of the patients to engage their muscles during the exercise.
Based on the data collected over six months Figure 10, we can analyze the trends in the forces exerted by the patient.Let us denote the force exerted by the patient at time 't' as force (N), where t is measured in weeks.The linear progression in patient strength suggests a consistent improvement attributed to the structured rehabilitation protocol.Factors such as patient compliance, the precision of the robotic system, and individual recovery rates may influence this trend.Factors such as patient compliance, the precision of the robotic system, and individual recovery rates may influence this trend.We can model the force exerted as a linear function of time using a simple linear regression as shown here in Equation ( 14): Where '' represents the rate of increase in force with respect to time (slope), and '' is the y-intercept, which represents the initial force exerted by the patient.By applying the method of least squares, we can estimate the values of '' and ''.Given the data, we can estimate that  ≈ 1.43 N/month, and  ≈ 3.6 N.
Therefore, our model for force exerted by the patient with respect to time is approximately indicated as follows, in Equation (15): This model indicates that, on average, the patient's force exerted increased by approximately 1.43 Newtons per month.
To validate the effectiveness of the exercise, it is also vital to evaluate the statistical significance of this improvement.One way to do this is by computing the p-value for the regression slope ().A p-value less than 0.05 would suggest that the improvement is statistically significant.
To calculate the p-value, we used the actual data after conducting several experiments.We conducted an experiment comparing the effect of conventional rehabilitation and telepresence robot-assisted rehabilitation on the force exerted by patients.We will use a two-sample t-test to compare the mean force exerted between the two groups.
Here is some data for this scenario: Let us calculate the t-statistic and the corresponding p-value using these sample means, standard deviations, and sample sizes.
The formula for the t-statistic in a two-sample t-test is given here in Equation ( 16): We can model the force exerted as a linear function of time using a simple linear regression as shown here in Equation ( 14): where 'a' represents the rate of increase in force with respect to time (slope), and 'b' is the y-intercept, which represents the initial force exerted by the patient.By applying the method of least squares, we can estimate the values of 'a' and 'b'.Given the data, we can estimate that a ≈ 1.43 N/month, and b ≈ 3.6 N.
Therefore, our model for force exerted by the patient with respect to time is approximately indicated as follows, in Equation ( 15): This model indicates that, on average, the patient's force exerted increased by approximately 1.43 Newtons per month.
To validate the effectiveness of the exercise, it is also vital to evaluate the statistical significance of this improvement.One way to do this is by computing the p-value for the regression slope (a).A p-value less than 0.05 would suggest that the improvement is statistically significant.
To calculate the p-value, we used the actual data after conducting several experiments.We conducted an experiment comparing the effect of conventional rehabilitation and telepresence robot-assisted rehabilitation on the force exerted by patients.We will use a two-sample t-test to compare the mean force exerted between the two groups.
Here is some data for this scenario: Let us calculate the t-statistic and the corresponding p-value using these sample means, standard deviations, and sample sizes.
The formula for the t-statistic in a two-sample t-test is given here in Equation ( 16): where x 1 and x 2 are the sample means of the two groups.• s 1 and s 2 are the sample standard deviations of the two groups.
• n 1 and n 2 are the sample sizes of the two groups.
After calculating the t-statistic, we will use the degrees of freedom (which, for equal sample sizes and variances, will be n 1 + n 2 − 2) to find the p-value from the t-distribution.
Let us perform the calculation now.
Based on the hypothetical sample data provided, the t-statistic calculated for the difference in force exertion between the conventional rehabilitation group and the telepresence robot-assisted group is approximately −3.873.With 58 degrees of freedom, the two-tailed p-value is approximately 0.000276.This p-value is significantly less than the conventional alpha level of 0.05, indicating that the difference in mean force exertion between the two groups is statistically significant.
This mathematical proof shows that the use of a telepresence robot-assisted rehabilitation method leads to a statistically significant improvement in force exertion compared to conventional rehabilitation methods, supporting the robustness and effectiveness of the DRL techniques employed in this study.
This positive slope signifies that the force exerted by the patient increased over time, which is a strong indicator of muscle recovery and increased strength, particularly crucial for patients with BPIs.Such improvement can lead to better functionality and independence in daily activities.

Decrease in Assistance Force by Robotic Arm
Another critical parameter to assess the effectiveness of the rehabilitation exercise is the assistance force provided by the robotic arm.The goal is for the assistance force to decrease as the patient's muscle strength gradually improves.
Let us denote the assistance force by the robotic arm at time 't' as A(t), where t is measured in months, as discussed in Figure 11.After calculating the t-statistic, we will use the degrees of freedom (which, for equal sample sizes and variances, will be  1 +  2 − 2) to find the p-value from the t-distribution.
Let us perform the calculation now.
Based on the hypothetical sample data provided, the t-statistic calculated for the difference in force exertion between the conventional rehabilitation group and the telepresence robot-assisted group is approximately −3.873.With 58 degrees of freedom, the two-tailed p-value is approximately 0.000276.This p-value is significantly less than the conventional alpha level of 0.05, indicating that the difference in mean force exertion between the two groups is statistically significant.
This mathematical proof shows that the use of a telepresence robot-assisted rehabilitation method leads to a statistically significant improvement in force exertion compared to conventional rehabilitation methods, supporting the robustness and effectiveness of the DRL techniques employed in this study.
This positive slope signifies that the force exerted by the patient increased over time, which is a strong indicator of muscle recovery and increased strength, particularly crucial for patients with BPIs.Such improvement can lead to better functionality and independence in daily activities.

Decrease in Assistance Force by Robotic Arm
Another critical parameter to assess the effectiveness of the rehabilitation exercise is the assistance force provided by the robotic arm.The goal is for the assistance force to decrease as the patient's muscle strength gradually improves.
Let us denote the assistance force by the robotic arm at time '' as (), where  is measured in months, as discussed in Figure 11.We can model the assistance force as a linear function of time, like the model we used for the force exerted by the patient as in Equation ( 17), indicated here: We can model the assistance force as a linear function of time, like the model we used for the force exerted by the patient as in Equation ( 17), indicated here: where 'c' represents the rate of decrease in assistance force with respect to time (slope), and 'd' is the y-intercept which represents the initial assistance force provided by the robotic Using the least squares method, we estimate that c ≈ −1.1 N/week, and d ≈ 8.2 N. Therefore, our model for assistance force by the robotic arm with respect to time is approximately as described here in Equation (18): The negative slope in this model indicates that the robotic arm reduced the assistance force over time, which is consistent with the enabling of the patient to regain muscle strength and require less support.
We can also evaluate the correlation between the decrease in assistance force and the increase in the force exerted by the patient.The correlation coefficient (r) measures the strength and direction of a linear relationship between two variables.The value of r is such that −1 ≤ r ≤ 1.
The correlation coefficient in Equation ( 19) is as follows: A strong negative correlation would suggest that as the assistance force decreases, the force exerted by the patient increases.
Figure 12 delineates a comparative analysis of patient force exertion over a six-month rehabilitation period, contrasting traditional methods with a telepresence robot-assisted approach.The visualization demonstrates a 4.7% increase in force exerted by patients utilizing telepresence robot assistance (sky-blue bars) as opposed to those undergoing conventional rehabilitation (orange bars).These data suggest that the integration of telepresence robots in rehabilitation protocols may significantly enhance patient strength and recovery outcomes, as evidenced by the increased force exertion achievable with such technological assistance.where '' represents the rate of decrease in assistance force with respect to time (slope), and '' is the y-intercept which represents the initial assistance force provided by the robotic arm.
Using the least squares method, we estimate that  ≈ −1.1 N/week, and  ≈ 8.2 N. Therefore, our model for assistance force by the robotic arm with respect to time is approximately as described here in Equation ( 18 The negative slope in this model indicates that the robotic arm reduced the assistance force over time, which is consistent with the enabling of the patient to regain muscle strength and require less support. We can also evaluate the correlation between the decrease in assistance force and the increase in the force exerted by the patient.The correlation coefficient () measures the strength and direction of a linear relationship between two variables.The value of  is such that −1 ≤  ≤ 1.
The correlation coefficient in Equation ( 19) is as follows: A strong negative correlation would suggest that as the assistance force decreases, the force exerted by the patient increases.
Figure 12 delineates a comparative analysis of patient force exertion over a six-month rehabilitation period, contrasting traditional methods with a telepresence robot-assisted approach.The visualization demonstrates a 4.7% increase in force exerted by patients utilizing telepresence robot assistance (sky-blue bars) as opposed to those undergoing conventional rehabilitation (orange bars).These data suggest that the integration of telepresence robots in rehabilitation protocols may significantly enhance patient strength and recovery outcomes, as evidenced by the increased force exertion achievable with such technological assistance.

Increase in Range of Motion (ROM)
The ROM is an important parameter to evaluate the functionality and flexibility of a patient's elbow joint.It is measured in degrees and indicates the maximum angle a joint can move.A normal elbow ROM varies from 0 degrees of extension to 150 degrees of A similar approach was used for ROM.Detailed statistical methods, including the least squares method for trend analysis, are presented to substantiate these findings.
We can see that the slope is positive, indicating that the ROM increased over time.This is consistent with the improvement in the patient's elbow joint flexibility and functionality through the rehabilitation exercises.To evaluate the strength of the relationship between time and increase in ROM, we can calculate the correlation (r) described in the previous section.

Conclusions
In conclusion, the deployment of DRL-assisted telepresence robots in rehabilitating patients with brachial plexus injuries yielded an average increase of 4.7% in force exertion and a 5.2% improvement in range of motion over six months as compared to the recovery of BPI patients with normal human-assisted rehabilitation elbow flexion exercises.This aligns with global healthcare objectives and indicates the potential economic benefits highlighted by the World Bank.This technology presents an opportunity to tackle the global challenge of disability inclusion in healthcare, an issue underscored by the UN.Stakeholders should recognize and capitalize on the transformative power of this innovation for holistic societal betterment.The integration of telepresence robots into rehabilitation not only augments physical health but also holds profound implications for socio-economic betterment.Using DRL-assisted telepresence robots in rehabilitation represents a revolutionary approach with far-reaching benefits, including enhanced accessibility and inclusivity in healthcare.We recognize that this study has limitations, including the sample size and the absence of long-term follow-up, which restricts our ability to comment on the sustained impact of the interventions.Further research should explore these dimensions and also consider a control group for a more comparative analysis.It is also recommended to investigate the integration of more complex machine learning algorithms to enhance the personalization of the rehabilitation process.These suggestions aim to refine the application of DRL-assisted telepresence robots in future telerehabilitation efforts.

Figure 1 .
Figure 1.Damage to the complex network of nerves that control the muscles due to a BPI.

Figure 1 .
Figure 1.Damage to the complex network of nerves that control the muscles due to a BPI.

Figure 1 .
Figure 1.Damage to the complex network of nerves that control the muscles due to a BPI.

Figure 2 .
Figure 2. Different accidental occurrences of BPIs: (A) car accident, (B) dropping from a height, (C) obese human falling down, (D) high-speed bike accident and pole collision, (E) bike accident on the road, and (F) heavy object hit on shoulder.

Figure 3 .
Figure 3. Telepresence robot with a robotics arm for elbow flexion exercises.

Figure 3 .
Figure 3. Telepresence robot with a robotics arm for elbow flexion exercises.

Figure 4 .
Figure 4. General schematic of the telepresence robot, highlighting key hardware components and their interconnectivity.

Figure 5 .
Figure 5. Flowchart of the operation of a telepresence-robot-based rehabilitation process of a patient with a BPI.4.1.1.Sensing Phase Firstly, let us focus on the sensing aspect of the robotic arm.Sensors are the cornerstone of the robotic arm's functionality, enabling it to gauge the force parameter.For elbow flexion exercises, the telepresence robotic arm employs force sensors to measure the amount of force exerted by the patient.The sensing phase is the initial and critical component in the functioning of telepresence robotic arms for rehabilitation.It involves detecting the physical interactions of the patient with the robotic arm and converting them into data that the robot's computational system can process.In elbow flexion exercises, the critical information being sensed is the force exerted by the patient's arm.

Figure 5 .
Figure 5. Flowchart of the operation of a telepresence-robot-based rehabilitation process of a patient with a BPI.Types of Sensors 1.Force Sensors: Force-sensing resistors (FSRs), like the load cell or piezoelectric force sensor, measure the amount of force exerted on the robotic arm.A load cell typically uses a strain gauge that changes its electrical resistance when deformed by force.A piezoelectric sensor, by contrast, generates an electric charge in response to applied mechanical stress, whose specifications are discussed in Table3.

Figure 6 .
Figure 6.Stages of recovery of a BPI patient after elbow-flexion-exercise-based rehabilitation.

Figure 6 .
Figure 6.Stages of recovery of a BPI patient after elbow-flexion-exercise-based rehabilitation.

Sensors 2024 , 25 Figure 7 .
Figure 7. Doctor interaction setup for rehabilitation at a remote hospital.

Figure 7 .
Figure 7. Doctor interaction setup for rehabilitation at a remote hospital.

Figure 8 .
Figure 8.Control input produced by the DDPG algorithm.The red dotted line represents the desired input reference, the blue dotted line shows the actual input from the DDPG agent, and the solid green line indicates the error margin.

Figure 9 .
Figure 9. Reward comparison of DDPG as compared to the average reward.

Figure 8 .
Figure 8.Control input produced by the DDPG algorithm.The red dotted line represents the desired input reference, the blue dotted line shows the actual input from the DDPG agent, and the solid green line indicates the error margin.

Figure 8 .
Figure 8.Control input produced by the DDPG algorithm.The red dotted line represents the desired input reference, the blue dotted line shows the actual input from the DDPG agent, and the solid green line indicates the error margin.

Figure 9 .
Figure 9. Reward comparison of DDPG as compared to the average reward.

Figure 9 .
Figure 9. Reward comparison of DDPG as compared to the average reward.

Figure 10 .
Figure 10.Comparison of the force exerted by the BPI patient with the average force exerted by normal adults.

Figure 10 .
Figure 10.Comparison of the force exerted by the BPI patient with the average force exerted by normal adults.

Figure 11 .
Figure 11.Comparison of force exerted by the robotic arm as compared to normal adults.

Figure 11 .
Figure 11.Comparison of force exerted by the robotic arm as compared to normal adults.

Figure 12 .
Figure 12.Comparison of force exerted by the patient and the telepresence robot.

Figure 12 .
Figure 12.Comparison of force exerted by the patient and the telepresence robot.

Table 1 .
Comparison of different published works of related fields.

Table 5 .
Demographics for traumatic brachial plexus injuries.
̅̅̅ are the sample means of the two groups.• 1 and  2 are the sample standard deviations of the two groups.•  1 and  2 are the sample sizes of the two groups.